Goto

Collaborating Authors

 similarity map


A Closer Look at the CLS Token for Cross-Domain Few-Shot Learning

Neural Information Processing Systems

Vision Transformer (ViT) has shown great power in learning from large-scale datasets. However, collecting sufficient data for expert knowledge is always difficult. To handle this problem, Cross-Domain Few-Shot Learning (CDFSL) has been proposed to transfer the source-domain knowledge learned from sufficient data to target domains where only scarce data is available.


Infusing Synthetic Data with Real-World Patterns for Zero-Shot Material State Segmentation

Neural Information Processing Systems

Minerals in rocks, sediment in soil, dust on surfaces, infection on leaves, stains on fabrics, and foam in liquids are some of these almost infinite numbers of states and patterns.




4ea14e6090343523ddcd5d3ca449695f-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing Systems

Thus, there is a need for a reference point, on which each model canbetested andfrom where potential improvements canbe derived. In this study, we select publicly available state-of-the-art visual search models and datasets in natural scenes, and provide a common framework for their evaluation. To this end, we apply a unified format and criteria, bridging the gaps between them, and we estimate the models' efficiency and similarity with humans using a specific set of metrics.



Comprehensive Evaluation of Prototype Neural Networks

Schlinge, Philipp, Meinert, Steffen, Atzmueller, Martin

arXiv.org Artificial Intelligence

Prototype models are an important method for explainable artificial intelligence (XAI) and interpretable machine learning. In this paper, we perform an in-depth analysis of a set of prominent prototype models including ProtoPNet, ProtoPool and PIPNet. For their assessment, we apply a comprehensive set of metrics. In addition to applying standard metrics from literature, we propose several new metrics to further complement the analysis of model interpretability. In our experimentation, we apply the set of prototype models on a diverse set of datasets including fine-grained classification, Non-IID settings and multi-label classification to further contrast the performance. Furthermore, we also provide our code as an open-source library (https://github.com/uos-sis/quanproto), which facilitates simple application of the metrics itself, as well as extensibility -- providing the option for easily adding new metrics and models.


DIV-Nav: Open-Vocabulary Spatial Relationships for Multi-Object Navigation

Ortega-Peimbert, Jesús, Busch, Finn Lukas, Homberger, Timon, Yang, Quantao, Andersson, Olov

arXiv.org Artificial Intelligence

Abstract-- Advances in open-vocabulary semantic mapping and object navigation have enabled robots to perform an informed search of their environment for an arbitrary object. However, such zero-shot object navigation is typically designed for simple queries with an object name like "television" or "blue rug". Here, we consider more complex free-text queries with spatial relationships, such as "find the remote on the table" while still leveraging robustness of a semantic map. We present DIV-Nav, a real-time navigation system that efficiently addresses this problem through a series of relaxations: i) Decomposing natural language instructions with complex spatial constraints into simpler object-level queries on a semantic map, ii) computing the Intersection of individual semantic belief maps to identify regions where all objects co-exist, and iii) V alidating the discovered objects against the original, complex spatial constrains via a L VLM. We further investigate how to adapt the frontier exploration objectives of online semantic mapping to such spatial search queries to more effectively guide the search process. Robots operating in human environments must interpret natural language commands that go beyond simple object identification. While a command like "find a chair" requires handling simple object classes only, real-world search instructions often specify spatial relationships: "go to the chair next to the desk," "find the towel in the bathroom," or "get the book on the nightstand."



Infusing Synthetic Data with Real-World Patterns for Zero-Shot Material State Segmentation

Neural Information Processing Systems

Minerals in rocks, sediment in soil, dust on surfaces, infection on leaves, stains on fabrics, and foam in liquids are some of these almost infinite numbers of states and patterns.